บทนำบทที่ 3: การจัดการกับการจำแนกประเภทที่ไม่เป็นเชิงเส้น

เราได้ก้าวข้ามข้อจำกัดของโมเดลเชิงเส้น ซึ่งมีปัญหาในการจำแนกข้อมูลที่แยกออกจากกันด้วยเส้นตรง วันนี้ เราจะใช้แนวทางการทำงานของ PyTorch เพื่อสร้าง เครือข่ายประสาทเทียมแบบลึก (DNN) ที่สามารถเรียนรู้ได้ถึง ขอบเขตการตัดสินใจที่ซับซ้อนและไม่เป็นเชิงเส้น ซึ่งจำเป็นต่อการจำแนกประเภทในโลกความจริง

1. การมองเห็นความจำเป็นของข้อมูลที่ไม่เป็นเชิงเส้น

ขั้นตอนแรกของเราคือการสร้างชุดข้อมูลจำลองที่ท้าทาย เช่น โครงสร้างสองดวงจันทร์ เพื่อแสดงให้เห็นอย่างชัดเจนว่าทำไมโมเดลเชิงเส้นพื้นฐานจึงล้มเหลว โครงสร้างนี้บังคับให้เราใช้สถาปัตยกรรมลึกเพื่อประมาณเส้นโค้งที่ซับซ้อนซึ่งจำเป็นต่อการแยกคลาส

คุณสมบัติของข้อมูล

โครงสร้างข้อมูล: ฟีเจอร์ข้อมูลจำลอง (เช่น $1000 \times 2$ สำหรับ $1000$ ตัวอย่างที่มีฟีเจอร์ 2 ตัว)
ประเภทผลลัพธ์: ค่าความน่าจะเป็นเดียว มักเป็น torch.float32แทนการเป็นสมาชิกของคลาส
เป้าหมาย: เพื่อสร้าง ขอบเขตการตัดสินใจที่โค้ง ผ่านการคำนวณแบบชั้น

พลังของฟังก์ชันการกระตุ้นที่ไม่เป็นเชิงเส้น

หลักการสำคัญของ DNN คือการนำความไม่เป็นเชิงเส้นเข้ามาในเลเยอร์ซ่อนไว้ผ่านฟังก์ชันเช่น ReLU. หากไม่มีฟังก์ชันเหล่านี้ การวางเลเยอร์ซ้อนกันจะทำให้เกิดโมเดลเชิงเส้นขนาดใหญ่เพียงอย่างเดียว ไม่ว่าจะลึกแค่ไหนก็ตาม

TERMINALbash — classification-env

> Ready. Click "Run" to execute.

TENSOR INSPECTOR Live

Run code to inspect active tensors

Question 1

What is the primary purpose of the ReLU activation function in a hidden layer?

Introduce non-linearity so deep architectures can model curves

Speed up matrix multiplication

Ensure the output remains between 0 and 1

Normalize the layer output to a mean of zero

Question 2

Which activation function is required in the output layer for a binary classification task?

Sigmoid

Softmax

ReLU

Question 3

Which loss function corresponds directly to a binary classification problem using a Sigmoid output?

Binary Cross Entropy Loss (BCE)

Mean Squared Error (MSE)

Cross Entropy Loss

Challenge: Designing the Core Architecture

Integrating architectural components for non-linear learning.

You must build a nn.Module for the two-moons task. Input features: 2. Output classes: 1 (probability).

Step 1

Describe the flow of computation for a single hidden layer in this DNN.

Solution:
Input $\to$ Linear Layer (Weight Matrix) $\to$ ReLU Activation $\to$ Output to Next Layer.

Step 2

What must the final layer size be if the input shape is $(N, 2)$ and we use BCE loss?

Solution:
The output layer must have size $(N, 1)$ to produce a single probability score per sample, matching the label shape.